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AMENDMENTS TO THE CLAIMS 

1 . (Currently amended) A method for performing a supervised learning process in an 

artificial intelligence environment including optimizing a database of sample records for 
the training and testing of a prediction algorithm for predicting the presence or absence of a 
specified medical condition in a patient, the method comprising the steps of: 

defining a set of one or more distributions of the database records onto respective 
training and testing subsets; 

using the defined set of distributions to train and test a first generation set of one or 
more prediction algorithms and assigning a fitness score to each, each of said prediction 
algorithms being associated with a certain distribution of said database records; 

feeding the set of prediction algorithms to an evolutionary algorithm which generates a 
set of one or more second generation prediction algorithms and assigns a fitness score to each; 

continuing to feed each generational set of prediction algorithms to the evolutionary 
algorithm until a termination event occurs, where said termination event is at least one of 

a prediction algorithm is generated with a fitness score equal to or exceeding a 

defined minimum value, 

the maximum fi tness score of successive generational sets of prediction 

algorithms converging to a given value, and 

a certain number of generations having been generated; 

selecting a prediction algorithm having a best fitness score; and 

using the distribution of database records associated with said selected prediction 
algorithm in performing supervised learning, said supervised learning including training and 
testing of prediction algorithms to obtain a trained prediction algorithm, 

wherein 

said method is performed using a computer and computer software forming an 
intelligent system, and 

the trained prediction algorithm is effective to predict output variables for data 
relating to said condition, thereby predicting diagnosis of said conditio^ 
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and further comprising the steps of 

generating a population of prediction algorithms, where each one of said prediction 
algorithms is trained and tested according to a different distribution of the records of the data 
set in the complete database onto a training data set and a testing data set. 

each different distribution being created as one of a random distribution and a 
distribution formed by a deterministic mathematical process characterized as a 
pseudorandom distribution. 

each prediction algorithm of the said population being trained according to its 
own distribution of records of the training set and is validated in a blind way 
according its ov/n distribution on the testing set. 

a score reached by each prediction algorithm being calculated in the testing 
phase representing its fitness: 
providing an evo lutionary algorithm which combines the different models of 
distribution of the reco rds of the complete data set in a training and in a testing set which sets 
are represented each o ne by a corresponding prediction algorithm trained and tested on the 
basis of the said training and testin g data set according to the fitness score calculated in the 
previous step for the corresponding prediction algorithm. 

the fitness score of each prediction algorithm corresponding to one of the 
different distributions of the complete data set on the training and the testing 
data sets being the probability of evolution of each prediction algorithm or of 
each said distribution of the complete data set on the training and testing data 
sets: 

repeating the evolution of t he prediction algorithm generation for a finite number of 
generations or till the ou tput of the genetic algorithm converges to a best solution and/or till the 
fitness value of at least some prediction algorithm related to an associated data records 
distribution has reached a desired value: and 

setting the data records distribution for the best solution as the optimized training and 
testing subsets for training and testing prediction algorithm . 

2. (Cancelled) 
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3. (Previously presented) A method according to claim 1 characterised in that to each 
record of the data set a distribution variable is associated which is binary and has at least two 
status, one of this two status being associated with the inclusion of the record in the training set 
and the other in the testing set. 

4. (Previously presented) A method according to claim 1 characterised that the 
prediction algorithm is an artificial neural network. 

5. (Previously presented) A method according to claim 1, characterised in that the 
prediction algorithm is a classification algorithm. 

6. (Previously presented) A method according to claim 1 characterised in that once an 
optimum distribution has been computed, the optimised training data subset is made equal to a 
complete data set being the individuals included in the training subset distributed onto a new 
training set and onto a new testing set each one having about the half of the records of the 
original optimized training set, while the originally optimized testing set is used as a third data 
subset for validation purposes. 

7. (Previously presented) A method according to claim 6 characterised in that the 
distribution of the data of the originally optimized training set onto the new training and new 
testing set is optimized by means of a pre-processing phase including the steps of said method 
for optimizing a database of sample records, said records being records in the originally 
optimized training set. 

8. (Previously presented) A method according to claim 1, in which different choices of 
the structure of the training subset and the structure of the testing subset consist in different 
selections of the number of input variables of the data records of the database, which selections 
consist in leaving out at least one, preferably two or more variables from the entire input 
variable set forming each record, the records of the database comprising a certain number of 
known input variables and a certain number of known output variables. 
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9. (Previously presented) A method according to claim 8, characterised by the 
following steps: 

defining a distribution of data from the complete data set onto a training data set and 
onto a testing data set; 

generating a population of different prediction algorithms each one having a training 
and/or testing data set in which only some variables have been considered among all the 
original variables provided in the data sets, each one of the prediction algorithms being 
generated by means of a different selection of variables; 

carrying out learning and testing of each prediction algorithm of the population and 
evaluating the fitness score of each prediction algorithm; 

applying an evolutionary algorithm to the population of prediction algorithms for 
achieving new generations of prediction algorithm; 

for each generation of new prediction algorithms representing each one a new different 
selection of input variable, the best prediction algorithm according to the best hypothesis of 
input variables selection is tested or validated; 

a fitness score is evaluated and the prediction algorithms representing the selections of 
input variables which have the best testing performances and the minimum input variables are 
promoted for the processing of the new generations. 



10. (Previously presented) A method according to claim 8, further comprising a pre- 
processing phase, including the steps of said method for optimizing a database of sample 
records, for selecting the most predictive input variables. 

1 1 . (Currently amended) A method according to claim i 2-, 

in which different choices of the structure of the training subset and the structure of the 
testing subset consist in different selections of the number of input variables of the data records 
of the database, which selections consist in leaving out at least one, preferably two or more 
variables from the entire input variable set forming each record, the records of the database 
comprising a certain number of known input variables and a certain number of known output 
variables, 

and further comprising a pre-processing phase, including the steps of said method for 
optimizing a database of sample records, for selecting the most predictive input variables, 
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characterised in that the database subjected to the a pre-processing phase of input 
variable selection is a training subset and a testing subset processed with said method. 

12. (Currently amended) A method according to claim 1 2 characterised in that the 
complete database the distribution of the records of which has to be optimized has data records 
having a selected number of input variables, the selection being carried out with said method, 
and in which different choices of the structure of the training subset and the structure of the 
testing subset consist in different selections of the number of input variables of the data records 
of the database, which selections consist in leaving out at least one, preferably two or more 
variables from the entire input variable set forming each record, the records of the database 
comprising a certain number of known input variables and a certain number of known output 
variables. 

13. (Previously presented) A method according to claim 1 characterised in that a pre- 
processing phase for optimizing the distribution of the records on a training subset and a testing 
subset and for selecting the most predictive input variables, is carried out alternatively one to 
the other several times. 

14. (Previously presented) A method according to claim 1 characterised in that the 
evolutionary algorithm is a genetic algorithm with the following evolutionary rules: 

an average health value of the population is computed as a function of the fitness values 
of each single individual in the population; 

coupling, recombination of genes and mutation of genes are carried out in a 
differentiated manner depending on a comparison between the fitness of each individual of the 
couple and the average health value of the entire population to which the individuals belong; 

individuals having a fitness value lower or equal to the average health of the entire 
population are not excluded from the creation of new generations but are marked out and 
entered in a vulnerability list; 

the number of subjects entered in the vulnerability list defining the number of possible 
marriages. 
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15. (Previously presented) A method according to claim 14 in which for coupling 
purposes and for generation of children at least one parent individuals must have a fitness value 
greater than the average health value of the population. 

16. (Previously presented) A method according to claim 14, characterised in that each 
couple of individuals can generate individuals having a fitness different from the average 
health, so called offsprings if the fitness of one them, at least is greater than the average fitness, 
the offsprings of each marriage occupying the places of subjects entered in the vulnerability list 
and are marked out, so that a weak individual can continue to exist through his own children. 

17. (Previously presented) A method according to claim 14, characterised in that 
coupling between individuals having a very low fitness value and a very high fitness value are 
not allowed. 

18. (Previously presented) A method according to claim 14, characterised in that the 
following recombination rules of the genes of the parents individuals coupled are considered in 
the case the parents individuals have not common genes: 

the health of father and mother individuals are greater than the average health of the 
entire population; the crossover is a classical crossover according to which the genes of the 
father and of the mother individuals are substituted one with the other starting from a certain 
crossover point; 

the health of father and mother individuals are lower than the average health of the 
entire population; in this case the two children are formed through rejection of the parents 
genes they will receive by the crossover process; 

the health of one of the parents is less than the average health of the entire population 
while the health of the other pairent is greater than the average health of the entire population; in 
this case only the parents whose health is greater than the average health of the entire 
population will transmit their genes, while the genes of the parent having an health lower than 
the average health of the entire population are rejected. 
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19. (Previously presented) A method according to claim 18, wherein each gene is 
characterised by a status level, the method further characterised in that genes rejection consists 
in modifying the status of the genes from one status level to a different status level. 

20. (Previously presented) A method according to claim 1 8, characterised in that a 
modified crossover of the genes of the parents individuals is carried out when the parents 
individuals has part of the genes that coincide, this modified crossover provides for generating 
and offspring in which the genes selected for crossover are the most effective ones of the 
parents. 

21. (Previously presented) A method according to claim 14 in which the individuals 
are the different prediction algorithm representing a corresponding different initial random 
distribution of data records onto the testing and the training data set and the genes consist in the 
binary status variable of association of each record to the training and to the testing subset. 

22. (Previously presented) A method according to claim 14 in which the individuals 
are the prediction algorithms each one representing a different training and testing data set, the 
difference residing in a different selection of input variables for each different training and 
testing subset, and the genes consist in the different selection variable which is provided for 
each input variable in the different training and testing subsets, the above mentioned selection 
variable being a parameter indicating the presence/absence of each corresponding input 
variable in the records of each data set. 

23. (Previously presented) A method according to claim 1 characterized in that it is in 
the form of a software program comprising instructions executable by a CPU, the software 
program being stored in a memory to which the CPU can access. 

24. (Previously presented) A software program stored on a memory device, the said 
software program consisting in the method according to claim 1 in the form of a executable 
instructions of a CPU or of a computer system. 
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25. (Previously presented) A system for carrying out a method according to claim 1 
comprising an apparatus or device for generating an action of response which is autonomously, 
i.e. by itself, chosen among a certain number of different kinds of actions of response stored in 
a memory of the apparatus or autonomously generated by the apparatus basing the said choice 
of the kind of action of response on the interpretation of data collected autonomously by means 
of one or more sensors responsive to physical entities or which are fed to the apparatus by 
means of input means, the said interpretation being made by means of a prediction algorithm in 
the form of a software saved in a memory of the said apparatus and being carried out by a 
central processing unit, characterized in that the apparatus being further provided with means 
for carrying out a training and testing phase of the prediction algorithm by inputting to the said 
prediction algorithm data of a known database in which input variables of the input data 
representing the physical entities able to being sensed by the apparatus through the one or more 
sensors and/or able to be fed to the apparatus by means of the input means are univoquely 
correlated to at least one definite kind of action of response among the different kinds of 
possible action of response, the said means for carrying out the training an testing being in the 
form of a training and testing software saved in a memory of the apparatus, the said training 
and testing being carried out by means of a method according to claim 1, the said training and 
testing software program being the said method of training and testing in the form of a software 
program or instructions. 

26. (Previously presented) The system according to claim 25, characterized in that it is 
a system for sound or vocal recognition comprising input means responsive to acoustic waves, 
a processing unit connected to the input means responsive to acoustic waves, at least a memory 
in which a software program is stored the said program being in the form according to claims 
23 or 24 and comprising coded instructions for enabling the processing unit to carry out a 
method according to claim 1, a further or the same above mentioned memory in which a dataset 
of known data records is stored or can be stored and/or input means for storing in the further or 
the said above mentioned memory a dataset of known data records. 

27. (Previously presented) The system according to claim 25, characterized in that it is 
a system for image recognition, the input means being responsible to electromagnetic waves, 
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the system being able to recognize the shape of an object generating or reflecting 
electromagnetic waves, and/or the distance and/or the identity of the object. 

28. (Previously presented) The system according to claim 26, characterized in that the 
database of known data records comprises acoustic signals emitted by one or more objects or 
one or more living beings making part of the typical environment in which the device has to 
operate or the data relating to one or more images of one or more objects or one or more living 
beings making part of the typical environment in which the device has to operate to which are 
univoquely correlated to corresponding known kind, and/or identity and/or meaning of objects 
to which the said acoustic signals or image data are related and/or from which the said acoustic 
signals or image data are generated. 

29. (Previously presented) The system according to claim 27, characterized in that it is 
a specialized system for image pattern recognition having artificial intelligence utilities for 
analyzing a digitalized image, i.e. an image in the form of a array of image data records, each 
image data record being related to a zone or point or unitary area or volume of a two or three 
dimensional visual image, so called pixel or voxel of a visual image, the said visual image 
being formed by an array of the said pixels or voxels and utilities for indicating for each image 
data record a certain quality among a plurality of known qualities of the image data records; 

the system having a processing unit as for example a conventional computer, a memory 
in which an image pattern recognition algorithm is stored in the form of a software program 
which can be executed by the processing unit; 

a memory in which a certain number of predetermined different qualities which the 
image data records can assume has been stored and which qualities has to be univoquely 
associated to each of the image data records of an image data array fed to the system; 

input means for receiving arrays of digital image data records or input means for 
generating arrays of digital image data records from an existing image and a memory for 
storing the said digital image data array; 

output means for indicating for each image data record of the image data array a certain 
quality chosen by the processing unit in carrying out the image pattern recognition algorithm in 
the form of the said software program; 
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the image pattern recognition algorithm is a prediction algorithm in the form of a 
software program, which prediction algorithm is further associated to a system being further 
provided with a training and testing software program; 

the system is able to carry out training and testing according to the method of claim 1; 

the method is provided in the system in the form of the training and testing software 
program; 

a database being also provided in which data records are contained univoquely 
associating known image data records of known image data arrays with the corresponding 
known quality from a certain number of predetermined different qualities which the image data 
records can assume. 



30. (Previously presented) A method for producing a microarray for genotyping 
operations, the said method comprising the steps of defining a certain number of theoretically 
relevant genes or alleles or polymorphisms considered relevant for a certain biologic condition 
like a tissue structure, a pathology or the potentiality of developing a pathology or an anatomic 
or morphologic feature: 

a) providing a database of experimentally determined data in which each record relates 
to a known clinical or experimental case of a sample population of cases and which records 
comprise a certain number of input variables corresponding to the presence/absence of a certain 
predetermined number of polymorphisms and/or mutations and/or equivalent genes of a certain 
number of theoretically probable relevant genes, said certain predetermined number of 
polymorphisms and/or genes forming a set, and one or more related output variables 
corresponding to the certain biological or pathologic condition of the said clinical and 
experimental cases of the sample population; 

characterized by the following further steps: 

b) determining a selection of a subset of the set of certain predetermined number of 
polymorphisms and/or genes by testing the association of the said genes or polymorphisms and 
the biological or pathological condition by means of mathematical tools applied to the 

database; 

c) the said mathematical tools comprise a so called prediction algorithm such as a so 
called neural network; 

and the further steps are carried out of: 
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d) dividing the database into a training and a testing dataset for training and testing the 
prediction algorithm; 

e) defining two or more different training datasets each one having records with a set of 
input variables obtained by excluding one or more input variables from the originally defined 
number of input variables, while for each record the set of input variables of the corresponding 
training set has at least one input variable which is not a member of the set of input variables of 
the other training datasets, each said at least one input variable consisting in a different gene or 
a different polymorphisms and/or a different mutation and/or a different functionally equivalent 
gene thereof of the originally considered genes or polymorphisms and/or mutations and/or 
functionally equivalent genes thereof considered theoretically potentially relevant for the 
biologic or pathologic condition; 

f) training the prediction algorithm with each of the different training sets defined under 
point e) for generating a first p opulation of different prediction algorithm which are divided 
into two groups of mother and father prediction algorithms and testing the said prediction 
algorithms with the associated testing set; 

g) calculating a fitness score or prediction accuracy of each father and mother 
prediction algorithms of the said first population by means of the testing results; 

i) providing a so called evolutionary algorithm such a genetic algorithm and applying 
the evolutionary algorithm to the first population of mother and father prediction algorithms for 
achieving new generation of prediction algorithms whose training and testing dataset comprises 
records whose input variables selections are a combination of the input variable selections of 
the records of the training and of the testing datasets of the first or previous population of father 
and mother prediction algorithms according to the rules of the evolutionary algorithm; 

j) for each generation of new prediction algorithms representing each new variant 
selection of input variables, the best prediction algorithm according to the best hypothesis of 
input variable selection is tested or validated by means of the testing dataset; 

k) a fitness score is evaluated and the prediction algorithms representing the selections 
of input variables which have the best testing performance with the minimum number of input 
variables utilized are promoted for the processing of new generations; 

1) repeating the steps i) to k) until a predetermined fitness score defined as best fit of the 
prediction algorithm and a minimum number of input variables has been reached; 
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m) defining as the selected relevant input variables i.e. as the relevant genes or 
polymorphisms and/or of mutations and/or of functionally equivalent genes thereof the ones 
related to the input variables of the selection represented by the prediction algorithm having 
both at least the predetermined fitness score and also the minimum number of selected input 
variables. 

3 1 . (Previously presented) A method according to claim 30, characterized in that an 
optimization of the distribution of the records of the original database in a training dataset and 
in a testing dataset is carried out in one of a pre processing and a post processing phase, i.e. 
before carrying out the steps e) to m) at step d) or after having carried out the steps a) to m). 

32. (Previously presented) The method according to claim 3 1 comprising the following 
steps of optimisation: 

defining a set of one or more distributions of the database records onto respective 
training and testing subsets; 

using the defined set of distributions to train and test a first generation set of one or 
more prediction algorithms and assigning a fitness score to each; 

feeding the set of prediction algorithms to an evolutionary algorithm which generates a 
set of one or more second generation prediction algorithms and assigns a fitness score to each; 

continuing to feed each generational set of prediction algorithms to the evolutionary 
algorithm until a termination event occurs; 

where said termination event is at least one of a prediction algorithm is generated with a 
fitness score equalling or exceeding a defined minimum value, the maximum fitness score of 
successive generational sets of prediction algorithms converging to a given value, and a certain 
number of generations having been generated. 

33. (Currently amended) The method according to claim 31, comprising the following 

steps: 

generating a population of prediction algorithm each one of them is trained and tested 
according to a different distribution of the records of the data set in the complete database onto 
a training data set and a testing data set; 
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each different distribution being created by a random distribution or a 
distribution formed by a deterministic mathematical process characterized as a 
pseudo-random distribution; 

each prediction algorithm of the said population is trained according to its own 
distribution of records of the training set and is validated in a blind way according its own 
distribution on the testing set; 

a score reached by each prediction algorithm is calculated in the testing phase 
representing its fitness; 

an evolutionary algorithm being further provided which combines the different models 
of distribution of the records of the complete data set in a training and in a testing set which 
sets are represented each one by a corresponding prediction algorithm trained and tested on the 
basis of the said training and testing data set according to the fitness score calculated in the 
previous step for the corresponding prediction algorithm; 

the fitness score of each prediction algorithm corresponding to one of the different 
distributions of the complete data set on the training and the testing data sets being the 
probability of evolution of each prediction algorithm or of each said distribution of the 
complete data set on the training and testing data sets; 

repeating the evolution of the prediction algorithm generation for a finite number of 
generations or till the output of the genetic algorithm converges to a best solution and/or till the 
fitness value of at least some prediction algorithm related to an associated data records 
distribution has reached a desired value; 

setting the data records distribution for the best solution as the optimized training and 
testing subsets for training and testing prediction algorithm. 

34. (Previously presented) A microarray for genotyping comprising a reduced number 
of genes, alleles or polymorphisms characterized in that the reduced number of the said genes, 
alleles or polymorphisms has been selected by means of a method according to claim 30. 

35. (Previously presented) A method for performing a supervised learning process in 
an artificial intelligence environment including optimizing a database of sample records for the 
training and testing of a prediction algorithm for a problem under investigation characterized 
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by input variables and output variables, the prediction algorithm used for predicting output 
variables for real world data, the method comprising the steps of: 

defining a set of one or more distributions of the database records onto respective 
training and testing subsets; 

using the defined set of distributions to train and test a first generation set of one or 
more prediction algorithms and assigning a fitness score to each, each of said prediction 
algorithms being associated with a certain distribution of said database records; 

feeding the set of predi ction algorithms to an evolutionary algorithm which generates a 
set of one or more second generation prediction algorithms and assigns a fitness score to each; 

continuing to feed each generational set of prediction algorithms to the evolutionary 
algorithm until a termination event occurs, where said termination event is at least one of 

a prediction algorithm is generated with a fitness score equal to or exceeding a 

defined minimum value, 

the maximum fitness score of successive generational sets of prediction 

algorithms converging to a given value, and 

a certain number of generations having been generated; 

selecting a prediction algorithm having a best fitness score; 

using the distribution of database records associated with said selected prediction 
algorithm in performing supervised learning, said supervised learning including training and 
testing of prediction algorithms to obtain a trained prediction algorithm; and 

using the trained prediction algorithm to predict the output variables relating to the 
problem under investigation where only the input variables are known, 

wherein said method is performed using a computer and computer software 
forming an intelligent system. 
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